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Abstract. Let Ai, . . . , Ak he d x n matrices. We define the 
row product of these matrices as a d^^ x matrix, whose rows 
are entry-wise products of rows of Ai,...,A;f. This construc- 
tion arises in certain computer science problems. We study the 
question, to which extent the spectral and geometric properties of 
the row product of independent random matrices resemble those 
properties for a d^ x n matrix with independent random entries. 
In particular, we show that the largest and the smallest singular 
values of these matrices are of the same order, as long as n ^ d^ . 

We also consider a problem of privately releasing the summary 
information about a database, and use the previous results to ob- 
tain a bound for the minimal amount of noise, which has to be 
added to the released data to avoid a privacy breach. 



1. Introduction 

This paper discusses spectral and geometric properties of a certain 
class of random matrices with dependent rows, which are constructed 
from random matrices with independent entries. Such constructions 
first appeared in computer science, in the study of privacy protection 
for contingency tables. The behavior of the extreme singular values of 
various random matrices with dependent entries has been extensively 
studied in the recent years P , [2] , [9] , [16] , [22] . These matrices arise in 
asymptotic geometric analysis [1], signal processing [2], [16], statistics 
[22] etc. The row products studied below have also originated in a 
computer science problem [9]. 

For two matrices with the same number of rows we define the row 
product as a matrix whose rows consist of entry-wise product of the 
rows of original matrices. 

Definition 1.1. Let x and y he 1 x n matrices. Denote by x ®,. y the 
1 X n matrix, whose entries are products of the corresponding entries 
of X and y: x (g>r vU) = ■ y{j)- If A is an x n matrix, and B is 



Key words and phrases. Random matrices, extreme singular values, privacy 
protection. 

Research was supported in part by NSF grants DMS-0907023 and DMS-1161372. 

1 



2 



MARK RUDELSON 



an M X n matrix, denote by A ®r -B an NM x n matrix, whose rows 
are entry-wise products of the rows of A and B: 

where {A B)i, Aj, Bk denote rows of the corresponding matrices. 

Row products arise in a number of computer science related prob- 
lems. They have been introduced in [7] and studied in [21] in the theory 
of probabilistic automata. They also appeared in compressed sensing, 
see [3] and [B], as well as in privacy protection problems [H]. These 
papers use different notation for the row product; we adopt the one 
from [6]. 

This paper considers spectral and geometric properties of row prod- 
ucts of a finite number of independent random matrices. The definition 
above assumes a certain order of the rows of the matrix A 0^ B. This 
order, however, is not important, since changing the relative positions 
of rows of a matrix doesn't affect its eigenvalues and singular values. 
Therefore, to simplify the notation, we will denote the row of the ma- 
trix C = A®y.B corresponding to the rows Aj and B^ by Cj^k- We will 
use a similar convention for the rows of the row products of more than 
two matrices. 

Recall that the singular values of N x n random matrix A are the 
eigenvalues of {A*AY^'^ written in the non-increasing order: si{A) > 
S2iA) > . . . > Sn{A) > 0. The first and the last singular values 
have a clear geometric meaning: Si{A) is the norm of A, considered 
as a linear operator from £2 to and if n < and rank(A) = n, 
then Sn{A) is the reciprocal of the norm of A~'^ considered as a linear 
operator from n AM" to i^. The quantity = 
called the condition number of A, controls the error level and the rate 
of convergence of many algorithms in numerical linear algebra. The 
matrices with bounded condition number are "nice" embedding of 
into M^, i.e. they don't significantly distort the Euclidian structure. 
This property holds, in particular, for random N x n matrices with 
independent centered subgaussian entries having unit variance, as long 
as N ^ n. 

Obviously, the row product of several matrices is a submatrix of their 
tensor product. This fact, however, doesn't provide much information 
about the spectral properties of the row product, since they can be 
different from those of the tensor product. In particular, for random 
matrices, the spectra of A^ B and A^r B are, indeed, very different. 
For example, let d < n < d"^, and consider dxn matrices A and B with 
independent ±1 random values. The spectrum of A(^B is the product 
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of spectra of A and B, so the norm oi A® B will be of the order 

0((y^+v^)2) =0(n), 

and the last singular value is 0(^{^/n — v^)^), see [T7]. From the other 
side, computer experiments show that the extreme singular values of 
the row product behave as for the d"^ x n matrix with independent 
entries, i.e. the first singular value is 

0{d+^) = 0{d), 

and the last one is 0{d— y/n), see [9]. Based on this data, it was con- 
jectured that the extreme singular values of the row product of several 
random matrices behave like for the matrices with independent entries. 
This fact was established in [H] up logarithmic terms, whose powers de- 
pended on the number of multipliers. We remove these logarithmic 
terms in Theorems 11.31 and 11.51 for row products of any fixed number 
of random matrices with independent bounded entries. To formulate 
these results more precisely, we introduce a class of uniformly bounded 
random variables, whose variances are uniformly bounded below. To 
shorten the notation we summarize their properties in the following 
definition. 

Definition 1.2. Let 6 > 0. We will call a random variable ^ a. S 
random variable if |,^| < 1 a.s., = 0, and E,^^ > 6"^. 

We start with an estimate of the norm of the row product of random 
matrices with independent 6 random entries. 

Theorem 1.3. Let Ai, . . . , A;^ be d x n matrices with independent 6 
random entries. Then the K-times entry-wise product Ai 0^ A2 (8>r 
. . . is a d^ X n matrix satisfying 

P (II Ai ®r---®r^K\\>C'{d''/^ + n'/^)) <exp(-c(rf+^)). 

The constants C, c may depend upon K and 5. 

The paper [S] uses an e-net argument to bound the norm of the row 
product. This is one of the sources of the logarithmic terms in the 
bound. To eliminate these terms, we use a different approach. The 
expectation of the norm is bounded using the moment method, which 
is one of the standard tools of the random matrix theory. The moment 
method allows to bound the probability as well. However, the estimate 
obtained this way would be too weak for our purposes. Instead, we 
apply the measure concentration inequality for convex functions, which 
is derived from Talagrand's measure concentration theorem. 
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The bound for the norm in Theorem 11.31 is the same as for a. x n 
random matrix with bounded or subgaussian i.i.d. entries, while the 
probabihty estimate is significantly weaker than in the independent 
case. Nevertheless, the estimate of Theorem 11.31 is optimal both in 
terms of the norm bound and the probability (see Remarks 15.31 and 
15.51 for details). In the important for us case > n the assertion of 
Theorem 11.31 reads 

P (^11 Ai (g)r... (S)r Ak\\ > C'Vd^^ < exp {-cd) . 

It is well-known that with high probability a random N x n matrix 
A with independent identically distributed bounded centered random 
entries has a bounded condition number, whenever N ^ n (see, e.g. 
[T5]). Our next result shows that the same happens for the row prod- 
ucts of random matrices as well. For the next theorem we need the 
iterated logarithmic function. 

Definition 1.4. For g G N define the function log(-g) : (0, oo) — M by 
induction. 

(1) log(i)t = max(logt, l); 

(2) log(g+i) t = log(i) (log(g)t). 

Throughout the paper we assume that the constants appearing in 
various inequalities may depend upon the parameters K, q, 6, but are 
independent of the size of the matrices, and the nature of random 
variables. 

Theorem 1.5. Let K,q,n,d be natural numbers. Assume that 

cd^ 

Let Ai, . . . , A;^ be d X n matrices with independent 6 random entries. 
Then the K-times entry-wise product Ai ®r A2 ®r • • • ®r satisfies 

P (^s„(Ai 'S)r ■■■'S)r Ax) < c'v^j < C exp {-cd) . 

This bound, together with the norm estimate above shows that the 
condition number of the row product of matrices with 6 random entries 
exceeds a constant with probability 0(exp(— cc?)). While this proba- 
bility is close to 0, it is much bigger than that for a. d^^ x n random 
matrix with independent random entries, in which case it is of order 
exp(— (i^). However, it is easy to show that this estimate is optimal 
(see Remarks 15.31 and 18. 2p . This weak probability bound renders stan- 
dard approaches to singular value estimates unusable. In particular. 
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the size of a (1/2) net on the sphere S"""^ is exponential in n, so the 
union bound in the £-net argument breaks down. 

This weaker bound not only makes the proofs more technically in- 
volved, but also leads to qualitative effects which cannot be observed in 
the context of random matrices with independent entries. One of the 
main applications of random matrices in asymptotic geometric anal- 
ysis is to finding roughly Euclidean or almost Euclidean sections of 
convex bodies. In particular, the classical theorem of Kashin [H] states 
that a random section of the unit ball of by a linear subspace of 
dimension proportional to is roughly Euclidean. The original proof 
of Kashin used a random ±1 matrix to construct these sections. The 
optimal bounds were obtained by Gluskin, who used random Gaussian 
matrices [5]. 

The particular structure of the £i norm plays no role in this result, 
and it can be extended to a larger class of convex bodies. Let D C 
be a convex symmetric body such that B2 C D and define the volume 
ratio jin] of D by 



vol(D) 



l/N 



'2 

Assume that the volume ratio of D is bounded: vr(Z}) < V . Then for 
a random N ^ n matrix A with independent entries satisfying certain 
conditions, 



P(3x e W \\Ax\\^ < {cVyT^N^''^ IIXII2) < exp(-cA^). 

This fact was originally established in [TS], and extended in [12] to a 
broad class of random matrices with independent entries. However, 
the volume ratio theorem doesn't hold for the row product of random 
matrices. We show in Lemma [3. 21 that there exists a convex symmetric 
body D C M'^^ with bounded volume ratio, such that 



inf 



^ < c{Kd) 



1/2 



with probability 1. For K > 1 this bound is significantly lower than 
= d^/"^^ which corresponds to the independent entries case. 
Surprisingly, despite the fact that the general volume ratio theorem 

breaks down, it still holds for the original case of the £1 ball. The main 

result of this paper is the following Theorem. 

Theorem 1.6. Let K,q,n,d be natural numbers. Assume that 

cd"" 



n < 



l0g(g)C?' 
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and let ... ^ Aa' be d x n matrices with independent 6 random en- 
tries. Then the K -times entry-wise product A = Ai A2 ®r • • • ®r A^ 
is a d^ X n matrix satisfying 

P (3x G 5""^ Ax ^ < c'd^) < C'exp {-cd) . 

Note that the results similar to Theorems II. 3[ II. 5[ and 11.61 remain 
valid if the matrices Ai, . . . , A;^ have different numbers of rows, and 
the proofs require only minor changes. 

The rest of the paper is organized as follows. In Section |2] we con- 
sider a privacy protection problem from which the study of row prod- 
ucts has originated. We derive an estimate on the minimal amount 
of noise needed to avoid a privacy breach from Theorem 11.51 Section 
|3] introduces necessary notation. Section H] contains an outline of the 
proofs of Theorems 11.31 and 11.61 Theorem II. 3l is proved in the first part 
of Section |5l The rest of this section and Section |6] develop technical 
tools needed to prove Theorem 11.61 

In Section [7] we introduce a new technical method for obtaining lower 
estimates. The minimal norm of Ax over the unit sphere is frequently 
bounded via an e-net argument. The implementation of this approach 
in in] was one of the main sources of the parasitic logarithmic terms. In 
Section [7]the lower bound is handled differently. The required bound is 
written as the infimum of a random process. The most powerful method 
of controlling the supremum of a random process is to use chaining, 
i.e. to represent the process as a sum of increments, and control the 
increments separately [21]. Such method, however, cannot be directly 
applied to control the infimum of a positive random process. Indeed, 
lower estimates for the increments cannot be automatically combined 
to obtain the lower estimate for the sum. Nevertheless, In Lemma 17.11 
we develop a variant of a chaining, which allows to control the infimum 
of a process. This chaining lemma is the major step in proving Theorem 
II. 6| which is presented in Section |8l where we also derive Theorem 11.51 
from it. 

Acknowledgement: the author thanks Dick Windecker and Martin 
Strauss for pointing out to the papers [3 |2ll |3l |6], and the referee for 
correcting numerous typos in the first version of the paper. 



2. Minimal noise for attribute non-privacy 

Marginal, or contingency tables are the standard way of releasing 
statistical summaries of data. Consider a database which we view 
as a c? X n matrix with entries from {0, 1}. The columns of the matrix 
are n individual records, and the rows correspond to d attributes of each 
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record. Each attribute is binary, so it may be either present, or absent. 
For any set of K + 1 different attributes we release the percentage of 
records having all attributes from this set. The list of these values for 
all sets forms the contingency table. In the row product notation 

the contingency table is the subset of coordinates of the vector 

K+l times 

y = (^D .?. (g)^ D^ui, 

which correspond to all sets of K + 1 different rows of the matrix D. 
Here w G M" is the vector with coordinates w = {1, . . . ,1). 

The attribute non-privacy model refers to the situation when d — 1 
rows of the database D are publicly available, or leaked, and one row 
is sensitive. The analysis of a more general case, where there are more 
than one sensitive attribute can be easily reduced to this setting. For 
the comparison of this model with other privacy models see [9] , and the 
references therein. Denote the {d — l)xn submatrix of D corresponding 
to non-sensitive attributes by D', and the sensitive vector by x. Then 
the coordinates of y contain all coordinates of the vector 

K times K times 

Z= (^D' <S)r ■ ■ ■ 'S)r D' ®rX^^W = (^D' ®r ■ ■ ■ ®r D'^X, 

which correspond to K different rows of the matrix D'. Hence, if the 
database D' is generic, then the sensitive vector y can be reconstructed 
from D' and the released vector z by solving a linear system. To avoid 
this privacy breach, the contingency table is released with some ran- 
dom noise. This noise should be sufficient to make the reconstruction 
impossible, and at the same time, small enough, so that the summary 
data presented in the contingency table would be reliable. Let Znoise 
be the vector of added noise. Let D' be the (J) X n submatrix of 
D' (S)r ■ ■ ■ ®r D' corresponding to all K-element subsets of {1, . . . , n}. 
If the last singular value of D' is positive, then one can form the left 
inverse {D')2^ of D', and ||(Z)')^^|| = s~^{D'). In this case, knowing 
the released data z + ^noise we can approximate the sensitive vector x 
by x' = {D')l^{z + Znoise)- Then 

II X X II 2 II (-D ) 2^noise 1 1 2 — 1 1 (-^ )l | | ' II ^noise || 2 • 

Therefore, if ||-2noisc||2 = o(A/n ■ s~^(Z)')), then ||x — x'||2 = o{y/n). 
Since the coordinates of x are or 1, we can reconstruct (1 — o(l))n 
coordinates of x by rounding the coordinates of x' . Thus, the lower 
estimate of s~^{D') provides a lower bound for the norm of the noise 
vector. 
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We analyze below the case of a random database. Assume that the 
entries of the database are independent {0, 1} variables, and the entries 
in the same column are identically distributed. This means that the 
distribution of any given attribute is the same for each record, but 
different attributes can be distributed differently. We exclude almost 
degenerate attributes, i.e. the attributes having probabilities very close 
to or 1. In this case bound on the minimal amount of noise follows 
from 

Theorem 2.1. Let K,q,n,d be natural numbers. Assume that 



Let < p' < p" < 1, and let pi, . . . ,pa be any numbers such that p' < 
Pj < p" . Consider a dxn matrix A with independent Bernoulli entries 
Qj^k satisfying P (a^^fc = 1) = pj for all j = 1, . . . ,d, k = 1, . . . ,n. 

Then the K-times entry-wise product A = A A ®^ . . . A is a 
d^ X n matrix satisfying 



The constants c, c', C, C may depend upon the parameters K, q,p',p" . 

Proof. This theorem will follow from Theorem 11.51 after we pass to the 
row product of matrices having independent 6 random entries. To this 
end, notice that if an m x matrix U' is formed from the M xn matrix 
U by taking a subset of rows, then SniU') < SniU). 

Let d = 2Kd' + m, where <m < 2K. For j = 1, . . . , K denote by 
Aj the submatrix of A consisting of rows {2K{j — 1) + 1), ... , {2K{j — 
1) + K), and by Aj the submatrix consisting or rows {2K{j — 1) + 



+ 1), . . . , 2Kj. Let D], G R'^' be vectors with coordinates = 

{P2K{j~l)+l, ■ ■ ■ ,P2K{j~l)+K) and = {p2K{j~l)+K+l, ■ ■ ■ ,P2Kj)- Set 



Then Ai, . . . , are d'xn matrices with independent S random entries 
for some 6 depending on p',p". 

Let Us, s = 1,2,3 he Ns X n matrices, and let D G be a vector 
with coordinates satisfying \dj\ < 1 for all j. Then for any x G M" 

\\{Ui 0r {D^ ®r U2) ®r U3)x\\^ < \\{Ui U2 ®r f^3)a:||2 • 

Indeed, any coordinate of {Ui 0^ {D^ ®r U2) ®r ^^3)2; equals the corre- 
spondent coordinate of (f/i ®r U2 (8>r U3)x multiplied by some djj, so 



n < 





A, = D° ®, A] - D] ®, A°. 
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the inequality above follows from the bound on \djj\. This argument 
shows that for any (ei, . . . , ek) G {0, l}''^ 



®r ••• ®r Ax )^ 



< 



Therefore, 



2 



12 



||(Ai ®^ . . . ®^ Aa')x||2 < ^ ||(Ai^' ®^ . . . ®^ A^)r 

e={eu...,eK)e{0,iy'' 

< 2^ II (A (g),. . . . 0r A)x\ 

because A^-' ®r . . . A^^ is a submatrix of A (g)f. . . . CSr ^- Thus, for any 
t > 

P {Sn{A - --^r A) <t) <F (s„(Ai ■ ■ ■ ^k) < 2^^t). 

To complete the proof we use Theorem 11.51 with d' in place of d, and 
note that d < 3Kd'. □ 

3. Notation and preliminary results 

The coordinates of a vector x G M" are denoted by (a;(l), . . . , x{n)). 
Throughout the paper we will intermittently consider vector in 

M" and as an n x 1 matrix. The sequence ei, . . . ,e„ stands for the 
standard basis in M". For 1 < p < oo denote by B"^ the unit ball of 
the space i^: 

{/ n \ 1/P 

^ e M'^ I = [Y.\x{jW] <1 

By 5""^ we denote the Euclidean unit sphere. 

Denote by H^H the operator norm of the matrix A, and by ||y4||^^ 
the Hilbert-Schmidt norm: 

The volume of a convex set D C will be denoted vol(-D), and the 
cardinality of a finite set J by \J\. By [xj we denote the integer part 
of X G M. Throughout the paper we denote by K the number of terms 
in the row product, by q the number of iterations of logarithm, and 
by (5^ the minimum of the variances of the entries of random matrices. 
C, c etc. denote constants, which may depend on the parameters K, q, 
and 6, and whose value may change from line to line. 
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Let C be a compact set, and let e > 0. A set A/" C is called 
an e-net if for any x E K there exists y E M such that ||x — < s. 
If T : R™ is a linear operator, and M and M' are e-nets in B2 

and 5™ respectively, then 

\\T\\ < (1 — e)^^ sup ||Tx||2 < (1 — s)^"^ sup sup (Tx, y). 

We will use the following volumetric estimate. Let V C -B2 • Then for 
any e <1 there exists an e-net N <ZV such that 




We will repeatedly use Talagrand's measure concentration inequality 
for convex functions (see [20], Theorem 6.6, or [1^, Corollary 4.9). 

Theorem (Talagrand). ZetXi,...,X„ he independent random vari- 
ables with values in [—1,1]. Let f : [—1,1]" W be a convex L- 
Lipschitz function, i. e. 

Vx,ye[-l,l]" \f{x)-f{y)\<L\\x-y\\,. 

Denote by M the median of f{Xi, . . . , Then for any t > 0, 

p(|/(Xi,...,X„)-M| >t) <4exp 

To estimate various norms we will divide the coordinates of a vector 
X G M."^ into blocks. Let vr : {1, . . . ,n} — {1, . . . ,n} be a permutation 
rearranging the absolute values of the coordinates of x in the non- 
increasing order: |x(7r(l))| > ... > |a;(7r(l))|. For I < n and < m 
define 

m— 1 

A^o = 0, N^ = Y^ 4H, and set J„ = vr ({A^„ + 1, . . . , N^+i}^ . 

j=0 

In other words, /q contains / largest coordinates of \z\, Ii contains 4/ 
next largest, etc. We continue as long as Im 7^ 0- The block Im will 
be called the m-th block of type / of the coordinates of x. Denote x\i 
the restriction of x to the coordinates from the set /. We need the 
following standard 

Lemma 3.1. Let b < 1 and let x G B2 fl bB^. For I < consider 
blocks Iq, /i, . . . of type I of the coordinates of x. Then 

m>0 
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Proof. Note that the absolute value of any non-zero coordinate of 
is greater or equal Hence, 

^ ^ l-^ml ll-^l^n llcx) ~ ^ ll-^l-fo lloo ~^ ^ ^ ] l-^m-1 1 ' 1 1 I /m I loo 
m>0 m>l 

</62 + 4^||x|z_J|^<5. 

m>l 

□ 

The next lemma shows that Theorem 11.61 cannot be extended from 
Li norm to a general Banach space whose unit ball has a bounded 
volume ratio. 



Lemma 3.2. There exists a convex symmetric body D C M"^^ such that 
Bf C D, 

vol(D) N 



< C 



satisfying 

inf II (Ai (g)r...®r ^K)x\\n < ci^Kdf/^ 
for all dx n matrices Ai, . . . , Ak with entries 1 or —1. 
Proof. Set 

W = y El® ...®eK 

ei,...,eK<^{-lAV 

and let D = conv (^{dK)~^^'^W, -82^ j . To estimate the volume ratio of 



D we use Urysohn's inequality [13] : 

(^^^T^y <d-^/'Esnp{g,x), 
Vvol(Sf ); - ' 

where (7 is a standard Gaussian vector in M'^* . Since 

D C (rfA')-i/2ponv(H^) + Bf , 

the right hand side of the previous inequality is bounded by 

1 + ^-^/2- (rfJO^'/'E sup < l + c(rfJO"'/'log^/'|Vr|, 

where |Ty| is the cardinality of W . Since |M^| = 2^^-^, the volume ratio 
of D is bounded by an absolute constant. 

Let ei be the first basic vector of R". The lemma now follows from 
the equality (Ai ®r ■ ■ ■®t Ax)ei = e\® . . .® Sk, where ei, . . . ,eK are 
the first columns of the matrices Ai, . . . , Ak- □ 
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4. Outline of the proof 

We begin with proving Theorem 11.31 We use the moment method, 
which is one of the standard random matrix theory tools. To estimate 
the norm of a rectangular random matrix A with centered entries, one 
considers the matrix [A* Ay for some large p E N, and evaluates the 
expectation of its trace using combinatorics. Since 
any estimate of the trace translates into an estimate for the norm. 
Following a variant of this approach, developed in [1], we obtain an 
upper bound for the norm of the row product of independent random 
matrices, which is valid with probability close to 1. However, the mo- 
ment method alone is insufficient to obtain an exponential bound for 
the probability. To improve the probability estimate, we combine the 
bound for the median of the norm, obtained by the moment method, 
and a measure concentration theorem. To this end we extend Ta- 
lagrand's measure concentration theorem for convex functions to the 
functions, which are polyconvex, i.e. convex with respect to certain 
subsets of coordinates. 

Before tackling the small ball probability estimate for 

min ||(Ai • • • ®r Ax)a;|L , 

we consider an easier problem of finding a lower bound for 
II (Ai ■ ■ ■ ®r '^r '^k)x\\i for a fixcd vector x G 5""^^ The en- 

tries of the row product are not independent, so to take advantage of 
independence, we condition on Ai, . . . , A^-i- To use Talagrand's the- 
orem in this context, we have to bound the Lipschitz constant of this 
norm above, and the median of it below. Such bounds are not available 
for all matrices Ai, . . . , A^-i, but they can be obtained for "typical" 
matrices, namely outside of a set of a small probability. Moreover, the 
bounds will depend on the vector x, so to obtain them, we have to 
prove these estimates for all submatrices of the row product. This is 
done in Sections [5l2 and[5l3. Using these results, we bound the small 
ball probability in Section [61 Actually, we prove a stronger estimate for 
the Levy concentration function, which is the supremum of the small 
ball probabilities over all balls of a fixed radius. 

The final step of the proof is combining the individual small ball 
probability estimates to obtain an estimate of the minimal £i-norm 
over the sphere. This is usually done by introducing an e-net, and 
approximating a point on the sphere by its element. Since the small 
ball probability depends on the direction of the vector x, one e-net 
would not be enough. A modification of this method, using several 
£-nets was developed in [11]. However, its implementation for the row 
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products lead to appearance of parasitic logarithmic terms, whose de- 
grees rapidly grow with [5] . To avoid these terms, we develop a new 
chaining argument in Section [71 Unlike standard chaining argument, 
which is used to bound the supremum of a random process, the method 
of section [7] applies to the infimum. 

In section |8] we combine the chaining lemma with the Levy concen- 
tration function bound of Section [6] to complete the proof of Theorem 
ll.6[ and derive Theorem 11.51 from it. We also show that the image of 
M" under the row product of random matrices is a Kashin subspace, 
i.e. the £i and £2 norms are equivalent on this space. 



5. Norm estimates 

5.1. Norm of the matrix. We start with a preliminarily estimate of 
the operator norm of the row product of random matrices. To this end 
we use the moment method, which is based on bounding the expecta- 
tion of the trace of high powers of the matrix. This approach, which is 
standard in the theory of random matrices with independent entries, 
carries over to the row product setting as well. 

Theorem 5.1. Let Ai, . . . , Ak be d x n matrices with independent 6 
random entries. Let p E N be a number such that p < cn^/^^^. Then 
the K -times entry-wise product A = Ai (><),■ A2 CB3r • • • ®r is a d^ x n 
matrix satisfying 

Proof. The proof of this theorem closely follows |4], so we will only 
sketch it. Denote the entries of the matrix A; by Sfj, so the entry of 
the matrix A corresponding to the product of the entries in the rows 
i^^\i^'^^ . . . i^^'^ and column j will be denoted 6^}1 6^!^-^ .. Then 



.(1) . • • • ".(A-) 



E 



A 



2p 

< Etr(AA^)P 



«i ,n «i ,11 «2 Ji ' 



(K) ^ 

(K) ■ > 
2 d\ 



■ ■ ■ . • • • ".(A-) . i . • • • ".(A) . }■ 

*p UP ^fc :JP ^1 ijp *1 UP 

Here V is the set of admissible multi-paths, i.e. a sequence of 2p lists 

{(^mi, Jm); • • • ) (^"iL ) Jm) }m=l SUch that 

(1) the column number jm is the same for all entries of the list m. 

(2) the first list is arbitrary; 
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(3) the entries of the second hst are in the same column as the 
entries of the first hst, the entries of the third hst are in the 
same rows as the respective entries of the second hst, etc.; 

(4) the entries of the last list are in the same rows as the respective 
entries of the first list; 

(5) every entry, appearing in each path, appears at list twice. 

Since the entries of the matrices Ai, . . . , are uniformly bounded, 
the expectations are uniformly bounded as well, so 

2p 



E 



To estimate the cardinality of V denote by /3(ri, . . . , ri^, c) the num- 
ber of admissible multi-paths whose entries are taken from exactly ri 
rows of the matrix Ai, exactly r2 rows of the matrix A2, etc., and 
exactly from c columns of each matrix. Note that the set of columns 
through which the path goes is common for the matrices Ai, . . . , Ak- 
An admissible multi-path can be viewed as an ordered i^'-tuple of closed 
paths qi, . . . ,qK of length 2p -(- 1 in the d x n bi-partite graph, such 
that qi{2j) = 52 (2j) = . . . = qxi'^j) for j = 1, . . . ,p, and each edge is 
traveled at least twice for each path. With this notation we have 

(5.1) E A <5^/3(ri,...,r^,c), 

J 

where J is the set of sequences of natural numbers (ri, . . . , r^, c) sat- 
isfying 

ri + c < p + 1 for each I = 1, . . . , K. 

The inequality here follows from condition (5) above. Let 7(ri, . . . , r^^, c) 
be the number of admissible multi-paths, which go through the first ri 
rows of the matrix Ai, the first r2 rows of the matrix A2, etc., and the 
first c columns. Then 

/3(ri, . . . ,rx,c) < ■ f[ (J^^ ■ 7(ri, . . .,rK,c). 

We call a closed path of length 2p + 1 path in the d x n bi-partite 
graph standard if 

(1) it starts with the edge (1, 1); 

(2) if the path visits a new left (right) vertex, then its number is 
the minimal among the left (right) vertices, which have not yet 
been visited by this path; 

(3) each edge in the path is traveled at least twice. 
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Let m(r, c) is the number of the standard paths through r left vertices 
and c right vertices of the bi-partite graph. Then 

K 

7(ri, . . . ,r;^,c) < c! ■ JJn! ■ m{ri,c). 

1=1 

This inequahty follows from the fact that all K paths in the admissible 
multi-path visit a new column vertex at the same time, so the column 
vertex enumeration defined by different paths of the same multi-path 
is consistent. Combining two previous estimates, we get 

K 

(3{ri, ...,rK,c) <7f ■ ]^rf'''m(n,c). 

1=1 

The inequality on page 260 |1] reads 

m(r,c) < .pl2(p-r-c)+14_ 



Substituting it into the inequality above, we obtain 
(5.2) 5^/3(ri,...,r^,c) 



J 



c=l ri+c<p+l rK+c<p+l 1=1 

p K p+l-c / s 2 

P \ . „12(p-n-c)+14 



=EnE -p' 

c=l 1=1 ri=l ^ 

To estimate the last quantity note that since p < 

P+l-c 2 

pl2(p-n-c)+14 



p+l-c / \ 2 

= p^n^/^ ^ y2(p+l-n-c)^-{p+l-n-c)/it j P j ^ri^(p-ri)/K 

ri=l 

Finally, combining this with (15. ip and (15. 2p . we conclude 



E 
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□ 

Applying Chebychev's inequality, we can derive a large deviation 
estimate from the moment estimate of Theorem I5.1[ 

Corollary 5.2. Under the conditions of Theorem \5 . 1\ 

P (II Ai ®r---®r^K\\> C\d^'^ + < exp [-cn^ 

Remark 5.3. The bound for the norm appearing in Corollary 
matches that for a random matrix with centered i.i.d. entries. This 
bound is optimal for the row products as well. To see it, assume that 
the entries of Ai, . . . , A^ are independent ±1 random variables. Then 



Aei 



Also, if X G S*" ^ is such that x{j) = n ^^^^ij, where 



5ij is an entry in the first row of the matrix A, then 



Ax 



2 



More precise versions of the moment method show that the moment 
bound of the type of Theorem 11.31 is valid for bigger values of p as well, 
and lead to more precise large deviation bound. We do not pursue 
this direction here, since these bounds are not powerful enough for our 
purposes. 

Instead, we use the previous corollary to bound the median of the 
norm of Ai (8>r • • • ®r and apply measure concentration. The stan- 
dard tool for deriving measure concentration results for norms of ran- 
dom matrices is Talagrand's measure concentration theorem for convex 
functions. However, this theorem is not available in our context, since 
the norm of Ai ®r • • • ®r is not a convex function of the entries 
of Ai, . . . , Ak- We will modify this theorem to apply it to polyconvex 
functions. 

i,KM 



Lemma 5.4. Consider a function F : M — )■ M. For 1 < k < K and 



xi, . . . , Xfc-i, Xfc+1, . . . , xk G K^'^ define a function fx^ 



fxi,...,Xk^i,Xk^i,...,Xfci^) -P^i^l: • • • ! ^k—ly 3;, Xfc-(_i, . . . , Xk) 

Assume that for all 1 < k < K and for all Xi, . . . , Xk-i, x^+i, . . . , Xk G 
the functions fxi,...,Xk^i,Xk+i,...,XK '^'"^ L-Lipschitz and convex. 
Let (ei, . . . , £k) = ((1^1,1, . . . , z/i_Af), . . . , {i^K,i, ^k,m)) ^ ^^^'^ 
a set of independent random variables, whose absolute values are uni- 
formly hounded by 1. If 



P(F(£i,...,£x) >/x) <2-4 



-K 
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then for any t > 

P(F(£i,...,£^) >/i + t) <4^ exp 



Proof. We prove this lemma by induction on K. In case K = 1 the 
assertion of the lemma follows immediately from Talagrand's measure 
concentration theorem for convex functions. 

Assume that the lemma holds for K — 1. Let F : M^^^ — M be a 
function satisfying the assumptions of the lemma. Set 

n = {(Xi, . . . , XK-l) e 5(f -1)^ I P {F{xi, . . . , XK-l, Bk) > /i) > 1/2}. 

Then Chebychev's inequality yields 

(5.3) F{{eu...,eK^i)eQ)<4~^''~'\ 

By Talagrand's theorem, for any (xi, . . . , Xx-i) G 



P ( F(xi,...,XK-i,£ir) >/i+— 1 <2exp ^2^. 



Hence, 



p (i^(ei, . . . , ^i.) > /X + I (ei, . . . , ei.-i) G 5^^-^)'^ \ 



< 2 exp 



t 

K 



Define 

S = jxi, G I P (f(£i, . . . , Ek-i, Xk) > /i 

(ei,...,e^_0Gi?(f"^)^^fi)>4-(^"^)}. 
The previous estimate and Chebychev's inequality imply 

P(£;,GH)<2-4^-^xp (^--^ 

If Xk G S'^, then combining the conditional probability bound with the 
estimate (15. 3p . we obtain 

¥{F{ei,...,eK-uXK) >I^ + J^) 

< P [F{eu . . . , bk-u XK)>fi + ^\ {su • • • , bk-i) G "^^^^ \ f]) + P (fi) 

< 2.4-(^-i). 
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Hence, applying the induction hypothesis with ^^^t in place of t, we 

get 

f , ^ t K -1 \ ^r^ , f ct^ 

P >/i+- + — ^tj <4^-^exp {-^^ 
Finally, 

< P . . . .Sk) > ^l + t\ Ek ^ B'^\Z) + ¥ {sk ez) 



< exp 



ct^ 



which completes the proof of the induction step. □ 

This concentration inequality combined with Corollary 15.21 allows to 
establish the correct probability bound for large deviations of the norm 
of the row product of random matrices. 

Proof of Theorem \1.3\ . For k = 1, . . . , K let Ek & IR"^" be the entries of 
the matrix Afc rewritten as a vector. For any matrices Ai, . . . , A^-i, 
Afc+i, . . . , Ax the function 

/Ai,...,Afe_i,Afc+i,...,AK (^fe) = 11^1 ®r- • • • ®r Afc_i <^r Afc (g)^ Afc+i <^r ■ ■ ■ <S)r ^k\ 

is convex. Also, since the absolute values of the entries of the matrices 
Ai, . . . , Afc_i, Afc+i, . . . , Ak do not exceed 1, 

|/Ai,...,Afc_i,Afc+i,...,AK(^fc) - /Ai,...,Afe_i,Afc+i,-,AK(^fc)l 

< II Ai (g)^ . . . Afc_i ®r (Afc - A'^) Afc+i . . . ®r Ak\\ 

< ||Ai ■ ■ - ^r Afc_i ®r (Afc - A^) (g)^ Afc+i (g)^ . . . ®r '^kWhS 

<- M<-l)/2 II A _ A / II 

^ a II^Afc ^kllHS ' 

so the Lipschitz constant of this function doesn't exceed d*^^"^-*/^. By 
Corollary 15. 2[ we can take n = C^d^/"^ + n^/'^). Applying Lemma [5.41 
with t = C"{d^/'^ + n^/^) finishes the proof. □ 

Remark 5.5. The probability bound of Theorem 11.31 is optimal. In- 
deed, assume first that > n, and let Ai, . . . , Ak he d x n matri- 
ces with independent random ±1 variables. Choose a number s G N 
such that a/s > C, where C is the constant in Theorem II. 3[ and set 
X = (ei -|- . . . + es)/y/s. With probability 2~'^^''^ all entries in the 
first s columns of these matrices equal 1, so ||(Ai ®r ■ ■ ■ ®r AA')a^||2 = 

In the opposite case, n > c?'^, set s = C'^n/d^, where the con- 
stant C is the same as above. Then for x defined above we have 
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II (Ai ®r ■ ■ ■ ®r '^k)x\\2 = \/s ■ d^^^^ = C y/u with probability at least 
2-'^-'^ = exp{C'Kn/d^-^). 

5.2. Norms of the submatrices. We start with two deterministic 
lemmas. The first one is a trivial bound for the norm of the row 
product of two matrices. 

Lemma 5.6. Let U he an M ^ n matrix, and let V be a d x n ma- 
trix. Assume that \vij\ < 1 for all entries of the matrix V. Then 
\\U^r V\\ <Vd\\U\\. 

Proof. The matrix U 0^ V consists of d blocks U Vj, j = I, ■ ■ ■ ,d, 
where Vj is a row of V. For any x G 



\\{U (S)rVj)x\\^ = \\U{Vj 0rX'^y\\^ < \\U\\ ■ \\vj (S)rx'^\\^ < \\U\\ ■ ||x||2 
|2 / v^d iirr ^ ||2 ^ , llrrl|2 



Hence, ||[/®, Vf < J2.=i \\U ®r Vj\\' <d\\U\\\ □ 

The second lemma is based on the block decomposition of the coor- 
dinates of a vector. 

Lemma 5.7. Let T : M" — M™ be a linear operator. Set L = 
[(1/4) log2 n] and let 1 < Lq < L. For I = 1, . . . , L denote 

Mi = {x e |supp(a;)| < 4', and x{j) E {0,2~',-2~'} for all j}. 

Let b < 2^^\ Then 

|T : B'i n bB'' ^ K^ll < ( V max ||T^|'^ 

I Z OO 2 II - 1 ^ ^^j^^ II I 

\1=Lq 

Proof. Let x G 5^ n 65^. Let /q, /i, . . . , h-Lo be blocks of type 4^o of 
coordinates of x. Recall that \Lm\ = 4^°+"^. If Xm 7^ 0, set 



m. 



x\ 




M 


1 1 00 



otherwise = 0. Then \\ym\\^ < \Im\ = 2 ^" and \\ym\\2 < 1, 
so ym G conv(A^L(,+m) for all m. By Cauchy-Schwartz inequality, 



m\\2 

m=0 \ m=0 / \ m=0 

L-Lo \ /L„Lo X 1/2 

m=0 / \ m=0 



||Tx||2 < 5^ ||Tx|,J|2 < I^-I ■ ■ E 11^^^ 



The estimate of Lemma 13.11 completes the proof. □ 
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For k E N denote by Wfc the set of all d'^ x n matrices V satisfying 

(5.4) \\V\j\\ < C, {d"' + ^\.\og'/' . 

for all non-empty subsets Jc{l,...,n}. Here V\j denotes the subma- 
trix of V with columns belonging to J, and Ck is a constant depending 
on k only. This definition obviously depends on the choice of the con- 
stants Ck- These constants will be defined inductively in the proof of 
Lemma 15.91 and then fixed for the rest of the paper. 

We will prove that the row product of random matrices satisfies 
condition f l5.4p with high probability. To this end we need an estimate 
of the norm of a vector consisting of i.i.d. blocks of coordinates. 

Lemma 5.8. Let W he an m x n matrix. Let 9 G M" he a vector 
with independent 5 random coordinates. For I E 'H let Yi,...,!^ he 
independent copies of the random variahle Y = Then for any 

s>0 

F (^E^'^4/||IV||^, + .j <2'.exp(-^). 

Proof. Note that F : — )■ M, F{x) = \\Wx\\ is a Lipschitz convex 
function with the Lipschitz constant By Talagrand's theorem 

P(|r-M| >t)< 4exp ( , 

^' I - ^ - 16||W^||V 

where M = M(y) is the median of Y. For j = set Zj = 

\Yj — M\. Then the previous inequality means that Zj is a 1^2 random 
variable, i.e. 

/ c'Z^ \ 

for some constant c' > 0. By the Chebychev inequality and indepen- 
dence of Zi, . . . , Zi, 

^(yz'>t] =P (^yz'>^] <2'.expf-^V 

Using the elementary inequality < 2(x — a)^ + 2a^, valid for all 
X, a G M, we derive that 

P I y > 2/M^ + 2t I < P I V Z^ >t \ < 2' ■ exp ( . 
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By Markov's inequality, = M{Y^) < 2EY^. To finish the proof, 
notice that since the coordinates of 6 are independent, 

■m n 

j=i k=i 

□ 

The next lemma shows that a "typical" row product of random ma- 
trices satisfies fl5.4D. 



Lemma 5.9. Let d,n,k E N be numbers satisfying n > c/'^+^/^. Let 
Ai,...,Afc matrices with independents random entries. There exist 
numbers Ci, . . . ,Ck > such that 

P ( Ai ®, . . . 0r i Wfc) < ke-'"^. 

Proof. We use the induction on k. 

Step 1. Let k = 1. In this case Ai is a matrix with independent 5 
random entries. For such matrices the result is standard and follows 
from an easy covering argument. Let x G S'^~^, and let y G S"'~^ fl 
W. Then {x, Ai\ jy) is a linear combination of independent 6 random 
variables. By Hoeffding's inequality (see e.g. [23]). 

P(|(x,Ai|,y)|>t)<e-^*' 

for any t > 1. Let J C {1, . . . \J\ = m. Let A/" be a (l/2)-net in 
S'^-^, and let M be a (l/2)-net in S""^ n Then 

||Ai|j|| < 4 sup sup(x, Ai|jy). 

The nets M and M. can be chosen so that \N'\ < 6*^ and \M\ < 6"^. 
Combining this with the union bound, we get 

P(||Ai|j|| >At) < \Ar\-\M\-e-''^ <exp{-ct^ + {m + d) log 6) < e""'*' 
provided that t > C{y/d+ y/rn). Let 



t = tm. = T- {Vd+ ^/m^\og—), 

with r > C to be chosen later, and set Ci = 4r. Taking the union 
bound, we get 

m=l \ J\=m m.=l ^ ^ 



\J\=m 
n 

< ^exp 

m=l 



z't"^ ■ { \fd + a/ttii / log — ) + m log — 
y \ m ) m 
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We can choose the constant r so that the last expression doesn't exceed 

Step 2. Let /c > 1, and assume that Ci, . . . , C^-i are aheady defined. 
It is enough to find > such that for any U G Wfc_i with \uij\ < 1 
for all z,j 

(5.5) ¥{U®r^kim)<e-"'. 
Indeed, in this case 

P (Ai ®^ . . . ®r Afc ^ Wfc I Ai ®^ . . . ®^ Afc_i e Wfc_i) < e"'^'^. 
Hence, the induction hypothesis yields 

P (Ai ®, . . . ®r Ak i Wfc) 

< P (Ai ®^ . . . ®^ Afc ^ Wfc I Ai ®^ . . . ®^ Afc_i G Wfe_i) 

Fix U G Wfc_i. To shorten the notation denote W = U ®r A^. For 
j G N define as the smallest number m satisfying 

<P < m log' (—] . 

Our strategy of proving (15. 5 p will depend on the cardinality of the set 
J C {1, . . . , n} appearing in (15.41) . 

Consider first any set J such that \J\ < mk_i. By Lemma [5. 6 j 

\\W\j\\ <Vd\\U\j\\<Vd- Cfe_i(ci('^"i)/2 ^ ^iog(^-i)/2(e^/| 

and so W satisfies the condition Wk with Ck = 2C/c„i for all such J. 

Now consider all sets J such that ruk-i < | J| < m^. The previous 
argument shows that any vector y G S*"^^ with |supp(?/)| < rrik-i 
satisfies < 2Ck-id^^'^- Any x G 5*""^ can be decomposed as 

X = y + where |supp(y)| < m^-i and \\z\\^ < ra^]/^ . Therefore, to 
prove (15. 5p . it is enough to show that 



P (3 J C {1, . . . , n} mfc_i < I J| < mfc and 



> Cd""]) < e 



-cd 



To this end take any z E S"' ^ such that |supp(2;)| < and H-^H;^ < 



-1/2 
1-1 • 

use the £-net argument to derive a bound for from it. 



'^fc-i • ^^^^ obtain a uniform bound on ||iyz||2 over all such z, and 
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Let M be the minimal natural number such that 4^^mfc_i > mk- 
Let /q, • • • , Im be blocks of type mfc_i of the coordinates of z. Since 
U G >Vfc_i, for any m < M 

||t/|/„J|' < Ck^i{d^-' + \Im\\og''-\en/\Im\)) < 2Cu-i\Im\\og''-\en), 

because |/m| > mk-i- 

Let e = {ei, . . . , £„) be a row of the matrix A^. Then the coordinates 
of the vector Wz corresponding to this row form the vector {U ^r^)^ = 
{U ®r z^)e'^ . Let U' be the d!^~^ x \J\ matrix defined as 

The inequality above and Lemma [3.11 imply 

M M 

<10Cfc_ilog('=-i)/2 ^^^^ 
Also, since all entries of U have absolute value at most 1, 

The sequence of coordinates of the vector Wz consists of d independent 
copies of f/'ej. Therefore, applying Lemma EHl with / = d and s = td^, 
we get 

p{x) : = P(||W^^f > (4 + t) -d^) < 2'^exp ' 



\U'f 



< 2'^ exp 



td'' 



c'fclog'^' ^(en) 



where = ACl_^/c. By the volumetric estimate, we can construct a 
(l/2)-net U for the set 

Ek := {z G 5""^ I |supp(z)| < mfe, < ml]/^] 

in the Euclidean metric, such that 

W\<\^] 6""^ < exp (2mfc log (en)) . 

Since n > d^^^/'^^ and nik < d'^, we have log(e?T,) < 2k\og{en/mk), and 
so 

mfclog(en) < {2k)'' 



d'' 



log'' ^{en) 
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Hence, we can chose the constant t = tk large enough, so that 

F{3zE^^ I \\Wzf > C',d^) < \M\ ■ 2^ exp (- ) 

V 4 log (en) / 



log'-' 



en 



< exp 

with the constant = 4 + tfc. Thus, 

P(3z e Ek I \\Wzf > AC'.d') < exp (- f \ , 

V log [en) J 

which implies condition fl5.4p with Ck = (4C|_i +4C(,)^/^ for all sets J 
such that I J| < mfc. 

Finally, consider any set J with | J| > nik- As in the previous case, we 
can split any vector x G S""'' as x = y + z, where |supp(y)| < and 

— 1/2 

1 1 -2 1 loo ^ ''^fc • '^^^ previous argument shows that with probability 
greater than 1 — exp ( — d''/ \og''~'{en)) , 

\\Wy\\<{ACl, + ACiy/'d' 

for all such y. Therefore, it is enough to estimate max ||Vr|j2;|| over 
z e B^n m~^/^5^. A (l/2)-net in the set n ml'^^B^^ is too big, 
so following the argument used in the previous case would lead to the 
losses that break down the proof. Instead, we will use the sets Aii 
defined in Lemma [5.71 and obtain the bounds for max ||H^|j2|| for each 
set separately. 

To this end, set fe = 1/ y/rrik, and let Lq be the largest number such 
that 2"^" > b. Let / > Lq and take any x G Aii. Choose any set 
/ D supp(x) such that |/| = 4'. As in the previous case, let U' be the 
(^k-i ^ jYiatrix defined as 



U' = {U(g)r x^)| 



I- 



Since all non-zero coordinates of x have absolute value 2^ = 1/ y|J 
the assumption U G Wfe_i implies 

\\u'\\ < 4= iit/Mi < ^ U'-'y' + vliT ■ 



<2C,.ilog(^-^)/^ (en -4-') 
The last inequality holds since for any m > rrik > ruk-i 

\m J 

Also, as before, all entries of U have absolute value at most 1, so 
||t^'||j/s — d'~^- The sequence of coordinates of the vector Wx consists 
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of d independent copies of U'eJ . Therefore, applying Lemma I5.8[ we 
get 



P(||iya;||^ >4d-d''-^ + s) < 2"'exp (^--0^^ 



< 2'^ exp 



4 log'' ^ (en ■ 4-') 



where 4 = 4C|_;^/c. Set 

s = s{l) = 24 ■4Mog^' (en -4-') . 

Then s{l) > 2c'^'m ^log^ {en /nik) > 2d^d^, so the previous inequality 
can be rewritten as 

P(||Vra;f >Cfcs(/)) <exp (-2 -4' log (en -4-')). 

Hence, the union bound implies that there exists a constant Ck satis- 
fying 

P(3/ > Lo 3x G Ml \\Wx\\ > Cks{l)) 

- £ (m) ' ■ {en ■ 4-')) < exp (-4^° log (en ■ 4-^°)) 

l=Lo ^ ^ 

< exp(-/). 

Define the event Qi by 

fii = {V/ > Lo Vx G A^i \\Wx\\ < s{l)}. 

The previous inequality means that P(f2'{) < exp(— c/^). 

Assume that the event Qi occurs. Let J C {1, . . . , n} be such that 
1^1 > nik, and choose L' so that < \ J\ < 4^ . Applying Lemma 

15.71 to T = W\j and b = 1 / Wnik, we obtain 



L' 



<5clJ2sil)<C'^A^'\og'' 



This shows that condition (El holds with Ck = {ACl_^ + AC',^ + AC'lY/^ 
for all non-empty sets J C {1, . . . ,n}. This completes the induction 
step and the proof of Lemma I5.9[ 

□ 
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5.3. Lower bounds for the Q-norm. To obtain bounds for the Levy 
concentration function below, we need a lower estimate for a certain 
norm of the row product of random matrices. 

Definition 5.10. Let U = (uj^k) be an M x m matrix. Denote 

j=l \k=l 

In other words, is the norm in the Banach space 

If [/ is an M X m matrix with independent centered entries of unit 
variance, then for any x G M", 

M / m \ 1/2 

i=i V fc=i / 

Moreover, if the coordinates of x are commensurate, we can expect 
that a reverse inequality would follow from the Central Limit theorem. 
This observation leads to the following definition. 

Let Vl be the set oi x n matrices A such that for any a; G M" 

(5.6) ||A(8)^^x^||q > crf^ ||x||2. 

We will show below that the row product of L independent d x n 
random matrices belongs to Vl with high probability, provided that 
the constant c in (15. 6 p is appropriately chosen. To this end, consider 
the behavior of (Ai ®r . . . A^) (g>r a^"^ L for a fixed vector x G M". 

Lemma 5.11. Let Ai, . . . , Al be d x m random matrices with inde- 
pendent 6 random entries. Then for any x G M™" 



P 



(\\{Ai (^r ■ • • Al) (g>r X^IIq < Cd^ WxW^^ < CXp 



cell 1 1 1 1 2 
II II oo 



Proof. Without loss of generality, assume that = 1, so ||x||2 > 1. 

Let a > 0, and let i^i, . . . , z/^ G [0, 1] be independent random variables 
satisfying Kuj > a for all j = 1, . . . ,n. The standard symmetrization 
and Bernstein's inequality [23] yield 



P(l5^a:2(j>,-E^x2(j>,| >t) <2exp -;^7^^ 
Setting t = (a/2) ||x||2, and using < 1, we get 

m 



x^{j)+t/3) 



P \f2^\j>, < f < 2exp (^-^ ||a;||^^ 
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Applying the previous inequality to the random variable Yi, i = 1, . . . 
which is the ^2-iiorm of a row of the matrix (Ai (g),. . . . A^) Cg^ x"^ , we 
obtain < cH^iy < 2exp(-c' ||x||2). Let < ^ < 1. If 



i=l 

then < c ||a;||2 for at least [1 — 6)d^ numbers i. Hence, 



12 

P (^||(Ai ®r • • • Al) 'S)rx\\Q < Ocd^ \\x\\^ 
< iu. '^l,,,lexp(-c(l-^M^||x||; 



,L(i-^M^J, 

< exp (-rf^(c(l - ^) ||x||2 - ^log 0) < exp(-(c/2)c/^ ||a;||2), 
if 6* is small enough. □ 

We will use Lemma 15.111 to show that the row product /S.i®r ■ ■ ■ ®r 
A/^_i satisfies condition (15.61) with high probability. 

Lemma 5.12. There exists a constant c > for which the following 
holds. Let K > 1, and let n < d^ . For d x n matrices Ai, . . . , A^-i 
be matrices with independent 6 random entries 

P (Al ®r . . . Ax-i ^ Vk-i) < exp(-crf^-^). 

Proof. Denote for shortness A = Ai ®,. . . . ®r ^K-i- To conclude that 
A G Vk-i, it is enough to show that condition (15.61) holds for any 

X e S''-\ 

For X G S"""^ denote by Q{x) the set of matrices A such that 
\\A (8>r x\\q < cd^~^. For L = K — 1 Lemma [5.111 yields 



(5.7) P(A G fi"(x)) < exp 



cd 



2 NIL 



As the first step in proving the lemma, we will show that for A = A 
condition ( 15.60 holds for all x from some subset of the sphere. More 
precisely, we will prove the following claim. 

Claim. Let a > and m < n. Denote 

S{a,m) = {x E S^~^ I < a, |supp(x)| < m}. 

If a'^mlogd < Cd^~^, then 

p(A^ fl nix)) < exp f-^^y 
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It is enough to prove the claim for < a < 1. Note that if < 
\yij)\ < for any J = l,...,/c, then || A (g)^ < || A a;^||^ . 

Hence, to prove the claim, it is enough to construct a set M of vectors 
y G B2 \ (l/2)i?2 such that for any x G S{a,m) there is y G A/" with 
\yij)\ < for all j and 



c'd 



a? 



M={ye{ ) I |supp(y)| < m, \\y\\^ < a and - < < 1 



2a2 



P(A^ f]n{y))<exp 
yeAf 

Set 

1 

,2v^y " ' - - 2 

By the volumetric considerations 

\J^\< |C"" < exp(cmlogra) < exp (C'm log ci), 
\m J 

since n < . For x G S{a,m) consider the vector y with coordinates 
= (1/20") ■ L2ym|a:(j)|J- Then < and ||y||2 > 

1 — ||x — yW^ > 1/2, so y G A/". By the union bound and {\5.7\i . 

f cd^^^' 
P(A^ fll](y))<|Ar|exp - 

The claim now follows from the assumption a^m log c? < Cd^~^ for a 
suitable constant C. 

The lemma can be easily derived from the claim. For a and m as 
above denote r2(a,m) = flxesia m) ^(^)- 

= 3c/(^~^)^/^ m, = min (c/*^/^ n) , i = 1, 2, 3. 

Then 7713 = n, and the condition a^rriilogd < Cd^~^, i = 1,2,3 is 
satisfied. Set 

3 

V = P|fi(ai,mi). 

By the claim, PCl^'^) ^ exp{-cd^-^). 

Assume now that A G V. Using the non-increasing rearrangement 
of we can decompose any x G S"""^ as x = xi + X2 + X3, where 

Xi,X2,X3 have disjoint supports, |supp(xi)| < m^, ||a;j||^ < aj/3. By 



the triangle inequality, WxiW^ > 1/3 for some i. Thus 

\ II A ^ 



^ \\Q>\\^®rXi\\^> 



T 

A®. 



l-^i II2 



. 1 > ^d--, 
Q 3-3 



since Xi/ \\xi\\2 G S{ai,mi). This proves the Lemma with c = c/3. □ 
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6. Bounds for the Levy concentration function 

Definition 6.1. Let p > 0. Define the Levy concentration function of 
a random vector X G by 

Ci{X,p) = sup P(||X-x||i < p). 

Unlike the standard definition of the Levy concentration function, 
we use the £i-norm instead of the ^2-norm. We need the following 
standard 

Lemma 6.2. Let X G M" be a random vector, and let X' be an inde- 
pendent copy of X . Then for any p > 

C,{X,p)<F'/\\\X-X'\\,<2p). 

Proof. Let y G M" be any vector. Then 

P'(||X - y\\, <p)= P (||X - y\\, < p and \\X' - y\\, < p) 

<¥{\\X-X'\\,<2p). 

Taking the supremum over y G M" proves the Lemma. □ 

In the next lemma, we bound the Levy concentration function using 
Talagrand's inequality, in the same way it was done in the proof of 
Lemma 15. 8[ 



Lemma 6.3. LetU = iuij) be any Nxn matrix, and let e = (ei, . . . , 
be a vector with independent 5 random coordinates. Then for any 
X G M" 



(6.1) £i {{U ®re^)x,c\\U ®rX^\\^ < 2exp -c 



N\\U®rX^ 



Proof. Note that {U ®r £ )x = {U ®r x )e. Let e'l, . . . be inde- 
pendent copies of ei, . . . , En- Applying Lemma 16. 2[ we obtain for any 
p > 

(6.2) Ci{{U^rX^)e,p) < P'/' (||(f/®.a;^)(£-£')||i < 2p) . 

Consider a function F : M" — )■ M, defined by 

F(y) = ||(f/®,x^)y||^, 

where |/ G M". Then F is a convex function with the Lipschitz constant 
L < \\U(g)rx'^ : B^W < VN\\U0rX^\\. 

By Talagrand's measure concentration theorem 

-.2 



F{\F{e-e') -M{F)\ > s) < 4 exp 



cs 
1? 
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where M(F) is a median of F, considered as a function on equipped 
with the probabihty measure defined by the vector e — e' . This tail 
estimate imphes 

|M(F) - EF| < ciL < ciViV \\U ®r x'^ll . 



By Lemma 2.6 [To] we have 



N 



EF = E 



1=1 



^Uijx{j) ■ {e{j)-e'{j)) 



N 



1/2 



i=i \j=i 



C2 \\{U (g), x'.iiQ 



Note that if the constant c' in the formulation of the lemma is chosen 
small enough, we may assume that 2cia//V II f/ (8)r-X"^|| < C2 || (t/ (8r a^"^) ||( 
Indeed, if this inequality does not hold, the right-hand side of f l6.ip 
would be greater than 1. Combining the previous estimates yields 
M(F) > (C2/2) ||(f/(g)^ x^)||q. Hence, 

P (\\{U ®r x^){e - s')\\^ < ^ \\U®rX^\ 

< F (^\F{e - e') - M(F)| > ^M(F)^ 

< 4exp ^ < 4exp ' 



|2 



J ~ \ N\\U ^rX^W 

This inequality and f l6.2p . applied with p = y ||(f/ ®r finish the 

proof. □ 

For the next result we need the following standard Lemma. 

Lemma 6.4. Let si, . . . ,Sd be independent non-negative random vari- 
ables such that P {sj < R) < p for all j . Then 



Proof. If Yl'j=i — \Rd, then Sj < R for at least d/2 numbers j. □ 

Combining Lemma 1^751 with this inequality, we obtain the tensorized 
version of Lemma 16.31 

Corollary 6.5. Let U = {uij) be any N x n matrix, and let V be a 
d X n matrix with independent 6 random coordinates. Then for any 
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X e 1 

(6.3) 



Ci[{U ®rV)x,cd\\U ®rx\\(^ <C2'^e^^\ -c'- 



T||2 



Proof. The coordinates of the vector {U ®r V)x G M^"' consist of d 
independent blocks {U ®r ^i)x, ■ ■ ■ ,{U ®r £d)x, where ei, . . . ,ed are 
the rows of V. The corollary follows from Lemma [6.4^ applied to the 
random variables Sj = \\{U (B)r £j)x — yj\\-^, where yi, . . . ,yd G are 
any fixed vectors. □ 



To prove Theorem 1 1.6 1 we have to bound the probability that the ma- 
trix Ai ®r • • • ^r^K maps some vector from the unit sphere into a small 
ii ball. Before doing that, we consider an easier problem of estimating 
the probability that this matrix maps a fixed vector into a small £i ball. 
We phrase this estimate in terms of the Levy concentration function. 

Lemma 6.6. Let U G Wr-i H Vi^-i be a d^^~^ x n matrix, and let Ak 
be a d X n random matrix with independent 6 random entries. For any 

xew^ 



Cl {{U®r AK)x,cd^ \\x\\^) 

( c"d\\x\\l\ I c'd"" \ 

Proof. To use Corollary 16. 5[ we have to estimate the Q-norm and the 
operator norms of U ®r x^ . The estimate of the Q-norm is given by 

dEED. 

To estimate the operator norm, assume that ||a:||2 = 1, and set s = 
[ll^lloo^J • Let L be the maximal number / such that 2's < n, and let 
Jo, . . . , /l be the blocks of coordinates of x of type s. Then ||x| ||^ < 
2~' and by Lemma [3.11 



j=0 



X J, 



< 5. 
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Let y G M". By Cauchy-Schwartz inequality, we have 

2 



1=0 

1=0 
L 



< 



.1=0 



II/II2' 



\y\ji\\l 



, 1=0 



which means 



T||2 



< 



En 



< 



Jl. 



I 11^ 

Mi 1 1 00 



/=0 1=0 

Since [/ G W/^-i, and |J;| > |Ji| = s for all / < L, the previous 



inequality implies 

.T||2 



u 



X 



i Moo 



<C d 



Therefore, by Corollary 16.51 and condition (15.61) . 

£1 {{U®.r AK)x,cd^) 



< exp 



d^- 



\x\ 



+ log--(f) 



The lemma follows from an elementary inequality exp( — ^) < exp( — ^) + 



exp( 



2c' 



□ 



7. Lower bounds via the chaining argument 



To get a global bound for the Levy concentration function using 
the bounds for each fixed vector, we prove a chaining-type estimate. 
Chaining argument is one of the main approaches to obtaining bounds 
for the supremum of a random process [2T]. Let {Xf | t G T} be a 
random process indexed by a set T. The chaining method is based on 
representing Xt as a sum of increments and proving an upper estimate 
for each increment separately, and combining these estimates using the 
union bound. 

A similar approach, based on passing from a random variable to in- 
crements can be applied to estimating the infimum of a random process 
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as well. In this case we isolate one "big" increment, whose position in 
the chain depends on t. The rest of the increments is divided in two 
groups. In one group the increments are small, and we can bound their 
absolute values above, and use the triangle inequality. The increments 
from the other group may be big, but they belong to a small set of 
random variables. In such situation, we can condition on these incre- 
ments, and obtain a lower bound on the conditional probability using 
the Levy concentration function of the "big" increment. Then we sum 
up these conditional probabilities over the small set. As usual for the 
chaining method, this step requires a balance between the estimate of 
the Levy concentration function, and the size of the set. 

Lemma 7.1. Let R > 0, a E {0, 1/2) and let be a sequence of 

natural numbers such that /q = 1 and /j+i > 2lj for all j = 0, . . . , L. Set 
n = Ij- Let A : M" — 7- be a random matrix with independent 



columns. Assume that for any j = 1, . . . , L there exists pj > such 

/ • llrll < 

Qen 



that for any x G S*" ^ with |supp(x)| < L, ||a;|| < / • 



(7.1) Ci{Ax,R)<pj< 
Then for any y G 



P ( 3a; G 5"^^ \\Ax - y\l < ^ ) < p^ + P ( ll^ll > ^ 

Proof. Denote \\A\\^^^ = \\A:B^^ B^\\. Let j G {1, . . . , L} and let 
J be a /j-element subset of {1, . . . , L}. Denote 

Sj = {xE S^-' I |supp(x) C J, < Ijli'}. 

Set ruj = Since the sequence {lj}j'=i increases exponentially, 

nij < Ij. We will need the following 

Claim. Let y G M^. Let 

Qj = {w I, II2 < 2a^~^ , supp(tf) n J = 0, |supp(tf)| < mj}. 
Then 

P (3^ eQj + Sj \\Az -y\\^<R-a \\A\\,^^) < pf. 

By the volumetric estimate we can choose an (a/2)-net Aij in Sj 
such that \M.j\ < {G/aY^. 

Take any x G M.j and w E Qj. Denote y' = y — Aw. Then the 
vectors Ax and y' are independent. Conditioning on the columns of A 
with indexes from supp(w), and using (17. ip . we get 

P {\\A{w + x)-y\\^< R \ A\jc) < P {\\Ax - y'\\^ < R \ A\jc) < pj. 



^R\ , 1/2 . ™ R 
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Taking the expectation with respect to A\jc yields 

F{\\A{w + x) -y\\^ < R) <pj. 

The volumetric estimate guarantees the existence of a (a/2)-net A/j 
in Qj such that 

< ( " ) (6a-0"" < ( ^ 

yrrij/ ^ ' \a^mj 

Since rrij < Ij, the last quantity does not exceed ^|ff-j • By the 
union bound and assumption (17.11) . 

P (3x G Mj 3w e Afj \\A{w + x)-y\\^< R) 

Assume that a point x' + w' & Sj + Qj satisfies + x') — y'\\^ < 

R — a \\A\\^_^^. Then, approximating it by a point x + w G Aij + Afj, 
such that \\x' + w' — x — wW^ < a, we get \\A{w + x) — y\\-^^ < R. This, 
in combination with the probability estimate above, proves the claim. 
Applying the union bound again, we see that the event 

n = {3j < n 3J C {1, . . . ,n} \ J\ = Ij 3x E Sj 3w E Qj 
\\A{w + x) - yW-^^ < R - a H^Ha^i) 

satisfies 

P(")<EEpf S max .^Pf. 

i=i \J\=h ""' ^ i=i 

By condition ([71]), ^ < vT ^ V2. The same condition and the 

exponential growth of Ij show also that the sequence {p^'^^j^L^ decays 
exponentially, and Xlj^iPj^^ — This implies 

(7.2) F{Q)<py\ 

Now let X E S"^'^ be any point. Let vr : {1, . . . ,n} — )■ {1, . . . , n} 
be a permutation rearranging the absolute values of the coordinates 
of X in the non-increasing order: \xn{i)\ > \xn{2)\ > ••• > |a^7r(n.)|- 
Let Ji U /2 n . . . n = {1, . . . , n} be the decomposition of {1, . . . ,n} 
into a disjoint union of consecutive intervals such that \Ij\ = Ij. Set 
Jj = 7T~^{Ij). In other words, the set Ji contains li largest coordinates 
of X, J2 contains I2 next largest etc. Let Xj be the coordinate projection 
of X to Jj, i.e. Xj{i) = x{i) ■ Since the largest coordinate of Xj 
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has the position Xlti + 1 in the non- increasing rearrangement, and 
||x||2 = 1, we conclude that 

lk.lL< (E^' + ij 

If for all J = 1, . . . , L ||xj||2 < «-'~V2, then 

II^II2<LII^.-Il2<2-T^<^- 

Hence, there exists a j such that HxjUg > a^^^ jl. Let j be the largest 
number satisfying this inequality. Then the vector u = J2i=j+i sat- 
isfies ||m||2 < Y,i=j+i W^ih ^ 

Assume that \\Ax - y\\^ < a^-^{R/2 - 2a \\A\\^^-J. Then 
j 



i=l 



<a^-\R/2-2a\\A\\,^,) + \\A\\,^,-\\u\\, 



<a^'~^(i?/2-a||..||2^i, 

Set J = supp(a:j), z = Xj/WxjW^, and w = ^i)/ \\^j\\2- Since 

||xj||2 > a^~^/2, w G Qj and the inequality above implies 
||74(tf; + z) — y/ ||a;j||2||j^ < -R — 2a ||A||2_j^-^. Hence, the assumption 
above implies that the event ^2 occurs. Therefore, 



P (3x e 5"-^ \\Ax - y\\, < ^^^^^^ 



<P(^3xg5"-^ \\Ax-y\\^<a^-^(^^-2a\\A\\^^-^j and \\A\\^^^ < 
Since ||y4||2_^-^ < ^/N \\A\\, the lemma is proved. □ 



8. Lower bounds for £i and ^2 norms 

In this section we use the chaining lemma [77T] to prove Theorems 11.61 
and 11.51 Actually we will prove a statement, which is stronger than 
Theorem 11.61 

Theorem 8.1. Let K,q,n,d be natural numbers. Assume that 



n < 



l0g(g)C?' 
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and let Ai, ... , Ak be d x n matrices with independent 6 random en- 
tries. Then for any y G M*^^ 

P (3a; G 5""^ II (Ai ®r ■ ■ ■ ®r Ak)x - y\\^ < c'd^) < C'exp {-cd) . 

Proof. Assume first, that d^ >n> d^~^l'^ so the condition of Lemma 
15.91 holds for k = K — \. Set R = cd^ , where c is the constant from 
Lemma I6l6] Set a = 8c/ C, where C" is the constant from Theorem I L 31 
By this Corollary, 

R 

P (||Ai 0r ■■■®r^K\\ > ^) < exp {-cd) . 

8avd^ 



Denote [/ = Ai ®r • • • ®r '^k-i, and let W be the set of all d^ ^ x n 
matrices A satisfying 

R 

F{\\A^r Aa'II > ^) <exp(-c'd), 

where c' = c/2. By the Chebychev's inequality P (t/ G W^) < exp (—c'd). 
Let y eM.'^''. By Lemmata ED and EUl 

P (3X G 5"^^ I II (Ai ®r---^r ^k)x - y 111 < Crf^) 

< P(3x G ^""^ I \\{U (g)r Ak)x - y\\^ < cd^^ and U G Wx-i n VnW) 

This estimate shows that it is enough to bound the conditional proba- 
bility 

P(3x G I \\{U ®r ^k)x - y\\^ < cd^ \ U) 

for all matrices U G Wx-inVflW. This bound is based on Lemma EH 
Fix a matrix U G Wi^-iflVriW for the rest of the proof. Let L = K + q. 
It is enough to define numbers li, . . . Jl G and pi, . . . ,pl G (0, 1) 
which satisfy the conditions of Lemma 17.11 These numbers will be 
constructed differently for j < K and j > K. The difference between 
these cases stems from the different behavior of the bound in Lemma 
16.61 For relatively small Ij the i^o norm of a vector x is large, and 
the second term in Lemma 16.61 is negligible, compare to the first one. 
However, for Ij > cd^ / \og^ n the picture is opposite, and the second 
term is dominating. 

We consider the case I < j < K first. Set /o = 1 and cq = 1. For 
1 <j < K set 

Cjd^ 

(8.1) = ^— , 

[log-' d\ 
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where the constants ci, . . . ,ck will be defined inductively. Assume that 
Ci, . . . , Cj_i are already defined. Applying Lemma 16.61 to any vector 

X G S"~^ with ||x||^ < Ijli^, we get 

/ cd^ \ 

P (||(f/ (S)r Ak)x -y\\,< cd"") < exp (-crf/,„i) + exp 

V log nj 

where we can take c^_^ = c ■ Cj-i/2. Inequality (17. ip reads 



> 8-^ ■ log 



log-' ^ d log-' d V Cj '^-' a-' 
Since n < d^ , this inequality follows from 



8ci , / 6e(i 



Therefore, we can choose Cj independently of d, so that the inequality 
above is satisfied. Thus, the sequence li, ... ,1k satisfies condition (17. ip . 
Also, ii d> df) for some d^ depending only on K and 5, then /j+i > 2lj 
for all j = 1. 

Let us now define the numbers I k+s for s = l,...,g + l. To this end 
define the sequence {PsYs^q by induction. Set 

/3o = — and 

ck 

13s = clogfi) (6e/3^„i) for 1 < s < g, 
where the number c > 1 will be chosen below. For < s < g set 

lK+s=[d''/(3s\. 

Note that for s = this formula agrees with (18. ip . Let I < s < q. By 
Lemma ESI any vector x G S^"-^ with < satisfies 

P {\\{U ®r Ak)x - y\\^ < cd^) 
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The last inequality follows from dlx+s-i > . In this case, condition 
f lTT^ reads 

cd'^ ( Gen 
—7 ^ > 81k+s ■ log ^ 

which can be rewritten as 

c 8 , / 6en(3s 
> ^ ■ log 



Since the sequence {I3s}1=q is decreasing, and n < d^ , the previous 
inequality holds, provided 



/3.>-log^"^(e/3._i: 
c 



log(6e/3,_i) + {K + q) log - 



Since by the definition of /3s, log(e/3s_i) > 1, we can choose 

8 1 
c = - ■ (i^ + g) log — . 

c a 

The inductive definition of the numbers is complete, and 

the sequences h, . . . , Ix+q, Pi, ■ ■ ■ ,PK+q satisfy condition (I7.ip . Also, if 
d > di for some di depending only K, q, and 6, then Ps+i < /3s/2, and 
so Ik+s+1 > '^Ik+s for s = 0, . . . , g - 1. 

Set h = Yl!j=ih- Then Ik+q n < From the definition of 

13s and induction follows that 1 < /3s < c'logj-^) c? for all s = 1, . . . , q. 
Hence, there exists c > depending only on K, q, 5 such that 

cd^ 

<n<d^. 

log(g) d 

Thus, for d > max(c/o, (ii), and n = n, the assertion of Theorem 18.11 
follows from Lemma [7.1[ It automatically extends to all n < n, since 
for any y G M*^^ the quantity 

min ||(Ai (g)^ . . . A/^)x — y|L 

can only increase, if we take the minimum over S*""^ fl M", instead of 
the whole sphere, in other words, if we consider a submatrix of Ai (8>r 
. . . ®r consisting of n first columns. It can be also automatically 
extended to the case d < max{do, di) by choosing a large constant C 
in the formulation of the Theorem. The proof is now complete. □ 

Remark 8.2. The probability estimate of Theorem 18. II is actually op- 
timal. Indeed, let y = 0, and assume that the entries of the matrices 
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Ai, . . . , Ai^ are i.i.d. random variables taking values 0, 1, —1 with prob- 
ability 1/3 each. Then with probability (1/3)'^, the first column of Ai 
is 0, and so the first column of Ai ®r • • • ®r is as well. 



We conclude with the proof of Theorem 1 1.5 1 Set A = Ai®r 
By Theorem 11.61 with probability at least 1 — exp(— cd), 



.®^A 



K- 



A 



< C'd^'"^ and Vx G S 



Aa; 



Then for any x E S 



n— 1 



Ax 



Ax 



A 



|X||2 < C'd^, 



SO all these norms are equivalent. Comparison between the first and 
the third term of this inequality implies Theorem 11.51 Moreover, as in 
[11], we can conclude that AM" is a Kashin subspace of R*^^, i.e. the 
Li and L2 norms are equivalent on it. More precisely, this establishes 
the following corollary. 

Corollary 8.3. Under the conditions of Theorem \1.6[ 

P(Vi/ G AM" \\y\\^ <d^/^\\y\\^ < C" \\y\\-^) > 1 - exp (-cd) . 
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